Fast semi-automatic semantic annotation for spoken dialog systems
نویسندگان
چکیده
This paper describes a bootstrapping methodology for semi– automatic semantic annotation of a “mini–corpus” that is conventionally annotated manually to train an initial parser used in natural language understanding (NLU) systems. We propose to cast the problem of semantic annotation as a classification problem: each word is assigned a unique set of semantic tag(s) and/or label(s) from the universal tag/label set. This approach enables “local” semantic annotation resulting in partially annotated sentences. The proposed method reduces the annotation time and cost that forms a major bottleneck in the development of NLU systems. We present a set of experiments conducted on the medical domain “mini– corpus” that contains 10K hand–annotated sentences. Three annotation methods are compared: parser (baseline), similarity and classification–based annotations. The support vector machine (SVM) based classification scheme is shown to outperform both similarity and parsed–based annotation.
منابع مشابه
Portability of Semantic Annotations for Fast Development of Dialogue Corpora
Generalization of spoken dialogue systems increases the need for fast development of spoken language understanding modules for semantic tagging of speaker’s turns. Statistical methods are performing well for this task but require large corpora to be trained. Collecting such corpora is expensive in time and human expertise. In this paper we propose a semi-automatic annotation process for fast pr...
متن کاملApproche bayésienne de la composition sémantique dans les systèmes de dialogue oral
Focusing on the interpretation component of spoken dialog systems, this paper introduces a stochastic approach based on dynamic Bayesian networks to infer and compose semantic structures from speech. Word strings, basic concept sequences and composed semantic frames (as defined in the Berkeley FrameNet paradigm) are derived sequentially from the users’ inputs. A semi-automatic process provides ...
متن کاملD3 Toolkit: A Development Toolkit for Daydreaming Spoken Dialog Systems
Recently various data-driven spoken language technologies have been applied to spoken dialog system development. However, high cost of maintaining the spoken dialog systems is one of the biggest challenges. In addition, a fixed corpus collected by human is never enough to cover diverse real user’s utterances. The concept of a daydreaming dialog system can solve the problem by making the system ...
متن کاملAnnotating Spoken Dialogs: From Speech Segments to Dialog Acts and Frame Semantics
We are interested in extracting semantic structures from spoken utterances generated within conversational systems. Current Spoken Language Understanding systems rely either on hand-written semantic grammars or on flat attribute-value sequence labeling. While the former approach is known to be limited in coverage and robustness, the latter lacks detailed relations amongst attribute-value pairs....
متن کاملSemi-supervised Learning for Spoken Language Understanding Using Semantic Role Labeling
In a goal-oriented spoken dialog system, the major aim of language understanding is to classify utterances into one or more of the pre-defined intents and extract the associated named entities. Typically, the intents are designed by a human expert according to the application domain. Furthermore, these systems are trained using large amounts of data manually labeled using an already prepared la...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004